---
layout: post
date: 2023-03-06
title: "Reading a JSON config in Zig"
description: "Reading a JSON configuration file in Zig"
tags: [zig]
---

<p>This post was updated on October 30 2023, to reflect changes to std.json currently on the master branch.</p>

<p>A while ago, I wrote a <a href="https://github.com/karlseguin/websocket.zig">Websocket Server implementation in Zig</a>. I did it just for fun, which means I didn't have to worry about the boilerplate things that go into creating an actual production application.</p>

<p>More recently, I've jumped back into Zig for something a little more serious (but not really). As part of my learning journey, I wanted to add more polish. One of the things I wanted to do was add configuration options, which meant reading a <code>config.json</code> file.</p>

<p>This seemingly simple task presented me with enough challenges that I thought it might be worth a blog post. The difficulties that I ran into are threefold. First, Zig's documentation is, generously, experimental. Second, Zig changes enough from version to version that whatever external documentation I did find, didn't work as-is (I'm writing this using 0.11-dev). Finally, decades of being coddled by garbage collectors have dulled by education.</p>

<p>Here's the skeleton that we'll start with:</p>

{% highlight zig %}
const std = @import("std");

pub fn main() !void {
  const config = try readConfig("config.json");
  std.debug.print("config.root: {s}\n", .{config.root});
}

fn readConfig(path: []const u8) !Config {
  //TODO: all our logic here!

  // since we're not using path yet, we need this to satisfy the compiler
  _ = path;
  return error.NotImplemented;
}

const Config = struct {
  root: []const u8,
};
{% endhighlight %}

<p>It would be a nice addition to be able to specify the path/filename of the configuration via a command line option, such as <code>--config config.json</code>, but Zig has no built-in flag parser, so we'd have fiddle around with <code>std.os.argv</code> or use a third party library.</p>

<p>Also, as a very quick summary of Zig's error handling, the <code>try function(...)</code> syntax is like Go's <code>if err != nil { return err }</code>. Note that our <code>main</code> function returns <code>!void</code> and our <code>readConfig</code> returns <code>!Config</code>. This is called an error union type and it means our functions return either an error or the specified type. We can list an explicit error set (like an enum), in the form of <code>ErrorType!Type</code>. If we want to support any error, we can use the special <code>anyerror</code> type. Finally, we can have zig infer the error set via <code>!Type</code> which, practically speaking, is similar to using <code>anyerror</code>, but is actually just a shorthand form of having an explicit error set (where the error set is automatically inferred at compile-time).</p>

<h3>Reading a File</h3>
<p>The first thing we need to do is read the contents of our file (after which we can parse it). We could explore the available API, but it should be somewhat obvious that this is going to require allocating memory. One of Zig's key feature is explicit memory allocations. This means that any part of the standard library (and hopefully 3rd party libraries) that might allocate memory take a parameter of type <code>std.mem.Allocator</code>. In other words, instead of having a <code>readFile(path) ![]u8</code> function which would allocate memory internally using, say, <code>malloc</code>, we'd expect to find a <code>readFile(allocator, path) ![]u8</code> which would use the supplied <code>allocator</code>.</p>

<p>Hopefully this will make more sense once we look at a concrete example. For now, we'll create an allocator in <code>main</code> and pass it to <code>readConfig</code>:</p>

{% highlight zig %}
const std = @import("std");
const Allocator = std.mem.Allocator;

pub fn main() !void {
  // we have a few choices, but this is a safe default
  var gpa = std.heap.GeneralPurposeAllocator(.{}){};
  const allocator = gpa.allocator();

  const config = try readConfig(allocator, "config.json");
  std.debug.print("config.root: {s}\n", .{config.root});
}

fn readConfig(allocator: Allocator, path: []const u8) !Config {
  //TODO: all our logic here!

  // since we're not using path yet, we need this to satisfy the compiler
  _ = path;
  _ = allocator;
  return error.NotImplemented;
}
{% endhighlight %}

<p>Now to actually read the file, there's a <code>Dir</code> type in the standard library that exposes a <code>readFile</code> and a <code>readFileAlloc</code> method. I find this API a little confusing (though I'm sure there's a reason for it), as these aren't static functions but members of the <code>Dir</code> type. So we need a <code>Dir</code>. Thankfully, we can easily get the <code>Dir</code> of the current working directory using <code>std.fs.cwd()</code>.</p>

<p>If we use <code>readFileAlloc</code>, our code looks like:</p>

{% highlight zig %}
fn readConfig(allocator: Allocator, path: []const u8) !Config {
  const data = try std.fs.cwd().readFileAlloc(allocator, path, 512);
  defer allocator.free(data);
  ...
}
{% endhighlight %}

<p>The last parameter, 512, is the maximum size to allocate/read. Our config is very small, so we're limiting this to 512 bytes. Importantly, this function allocates and returns memory using our provided allocator. We're responsible for freeing this memory, which we'll do when the function returns, using <code>defer</code>.</p>

<p>Alternatively, we could use <code>readFile</code>, which takes and fills in a <code>[]u8</code> instead of an allocator. Using this function, our code looks like:</p>

{% highlight zig %}
fn readConfig(allocator: Allocator, path: []const u8) !Config {
  var buffer = try allocator.alloc(u8, 512);
  const data = try std.fs.cwd().readFile(path, buffer);
  defer allocator.free(buffer);
  ...
{% endhighlight %}

<p>In the above code, <code>data</code> is a slice of <code>buffer</code> which is fitted to the size of file. In our specific case, it doesn't matter if you use <code>readFileAlloc</code> or <code>readFile</code>. I find the first one simpler. The benefit of the <code>readFile</code> though is the ability to re-use buffers.</p>

<p>It turns out that reading a file is straightforward. However, if you're coming from a garbage collected mindset, it's hopefully apparent that [manual] memory management is a significant responsibility. If we did forget to <code>free</code> <code>data</code>, and we did write tests for <code>readConfig</code>, Zig's test runner would detect this and automatically fail. The output would look something like:

{% highlight text %}
zig build test
Test [3/10] test.readConfig... [gpa] (err): memory address 0x1044f8000 leaked:
/opt/zig/lib/std/array_list.zig:391:67: 0x10436fa5f in ensureTotalCapacityPrecise (test)
    const new_memory = try self.allocator.alignedAlloc(T, alignment, new_capacity);
                                                                  ^
/opt/zig/lib/std/array_list.zig:367:51: 0x104366c2f in ensureTotalCapacity (test)
    return self.ensureTotalCapacityPrecise(better_capacity);
                                                  ^
/opt/zig/lib/std/io/reader.zig:74:56: 0x104366607 in readAllArrayListAligned__anon_3814 (test)
    try array_list.ensureTotalCapacity(math.min(max_append_size, 4096));
                                                       ^
/opt/zig/lib/std/fs/file.zig:959:46: 0x10436637b in readToEndAllocOptions__anon_3649 (test)
    self.reader().readAllArrayListAligned(alignment, &array_list, max_bytes) catch |err| switch (err) {
                                             ^
/opt/zig/lib/std/fs.zig:2058:42: 0x104365d0f in readFileAllocOptions__anon_3466 (test)
    return file.readToEndAllocOptions(allocator, max_bytes, stat_size, alignment, optional_sentinel);
                                         ^
/opt/zig/lib/std/fs.zig:2033:41: 0x104365a0f in readFileAlloc (test)
    return self.readFileAllocOptions(allocator, file_path, max_bytes, null, @alignOf(u8), null);
                                        ^
/code/demo.zig:13:41: 0x10436745f in readConfig (test)
    const data = try std.fs.cwd().readFileAlloc(allocator, path, 512);
                                        ^
/code/demo.zig:30:13: 0x104369547 in test.readConfig (test)
    const config = try readConfig(allocator, "config.json");
{% endhighlight %}


<h3>Parsing JSON</h3>
<p>We've read our file and have a string, which in Zig is a <code>[]u8</code>. Populating a <config>Config</config> struct is just a few more lines:</p>

{% highlight zig %}
fn readConfig(allocator: Allocator, path: []const u8) !std.json.Parsed(Config) {
  const data = try std.fs.cwd().readFileAlloc(allocator, path, 512);
  defer allocator.free(data);
  return std.json.parseFromSlice(Config, allocator, data, .{.allocate = .alloc_always});
}
{% endhighlight %}


<p>Notice that we're now returning a <code>std.json.Parsed(Config)</code>. If you think about turning a JSON string into an object, you know that there'll have to be <em>some</em> allocations. At a minimum, we need to allocate an instance of our <code>Config</code>. That's why we pass our allocator. The `Parsed` structure we get back contains a `value` which is our `Config` as well as an arena allocator. This is convenient as it allows us to easily manage the allocated memory by tying the lifetime of the allocation to our `Config` within the `Parsed` structure:</p>

{% highlight zig %}
pub fn main() !void {
  // we have a few choices, but this is a safe default
  var gpa = std.heap.GeneralPurposeAllocator(.{}){};
  const allocator = gpa.allocator();

  const parsed = try readConfig(allocator, "config.json");
  defer parsed.deinit();
  const config = parse.value;
  ...
}
{% endhighlight %}

<aside><p>I wrote a follow up blog post, <a href="/Zigs-std-json-Parsed/">Zig's std.json.Parsed(T)</a>, that explores <code>std.json.Parsed(T)</code> in greater detail.</p></aside>

<p>Also take note of the last parameter we passed to <code>parseFromSlice</code>. This is an optional argument that configures various aspects of the JSON parsing. We're specifying <code>.allocate = .alloc_always</code>, which tells the function to copy any strings from our input and thus have them owned by the Parsed object. The default is <code>alloc_if_needed</code>. Using this option would be more efficient since our parsed <code>Config</code> would reference our input <code>data</code>, rather than having to make a copy, but then we'd have to extend <code>data's</code> lifetime to match our return value. If you're creating many objects with large text values, it's likely something you'd want to consider.</p>

<h3>Default Values</h3>

<p>With the above code, <code>json.parse</code> will fail with an <code>error.MissingField</code> if our json file doesn't have a <code>root</code> field. But what if we wanted to make it optional with a default value? There are two options. The first is to specify the default value in our structure:</p>

{% highlight zig %}
const Config = struct {
  root: []const u8 = "/tmp/demo",
};
{% endhighlight %}


<p>Another option is to use an Optional type and default to <code>null</code>:</p>

{% highlight zig %}
const Config = struct {
  root: ?[]const u8 = null,
};
{% endhighlight %}

<p>When we want to use the value, we can use <code>orelse</code> to set the default:</p>

{% highlight zig %}
lmdb.open(config.root orelse "/tmp/db")
{% endhighlight %}


<h3>Conclusion</h3>
<p>The complete code to read a json config file into a config structure is:</p>

{% highlight zig %}
const std = @import("std");
const Allocator = std.mem.Allocator;

pub fn main() !void {
  var gpa = std.heap.GeneralPurposeAllocator(.{}){};
  const allocator = gpa.allocator();

  const parsed = try readConfig(allocator, "config.json");
  defer parsed.deinit();

  const config = parse.value;
  std.debug.print("config.root: {s}\n", .{config.root});
}

fn readConfig(allocator: Allocator, path: []const u8) !std.json.Parsed(Config) {
  // 512 is the maximum size to read, if your config is larger
  // you should make this bigger.
  const data = try std.fs.cwd().readFileAlloc(allocator, path, 512);
  defer allocator.free(data);
  return std.json.parseFromSlice(Config, allocator, data, .{.allocate = .alloc_always});
}

const Config = struct {
  root: []const u8,
};
{% endhighlight %}

<p>It would be nice to parse our JSON in a single line, and I don't get why <code>readFileAlloc</code> is a member of <code>Dir</code> (but I bet there's a reason). Overall though, it's pretty painless. Hopefully this helps someone :)</p>
